Analyzing taxonomic classification using extensible Markov models

نویسندگان

  • Rao M. Kotamarti
  • Michael Hahsler
  • Douglas W. Raiford
  • Monnie McGee
  • Margaret H. Dunham
چکیده

MOTIVATION As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogeneity issues with the 16S rRNA marker, no classifier is available for sub-genus level, which could readily suggest a classification for a novel 16S rRNA sequence. Metagenomics further complicates the issue by generating fragmented 16S rRNA sequences. This article proposes a novel alignment-free method for representing the microbial profiles using extensible Markov models (EMMs) with an extended Karlin-Altschul statistical framework similar to the classic alignment paradigm. We propose a log odds (LODs) score classifier based on Gumbel difference distribution that confirms correct classifications with statistical significance qualifications and suggests revisions where necessary. RESULTS We tested our method by generating a sub-genus level classifier with which we re-evaluated classifications of 676 microbial organisms using the NCBI FTP database for the 16S rRNA. The results confirm current classification for all genera while ascertaining significance at 95%. Furthermore, this novel classifier isolates heterogeneity issues to a mere 12 strains while confirming classifications with significance qualification for the remaining 98%. The models require less memory than that needed by multi-sequence alignments and have better time complexity than the current methods. The classifier operates at sub-genus level, and thus outperforms the naive Bayes classifier of the RNA Database Project where much of the taxonomic analysis is available online. Finally, using information redundancy in model building, we show that the method applies to metagenomic fragment classification of 19 Escherichia coli strains. AVAILABILITY AND IMPLEMENTATION Source code and binaries freely available for download at http://lyle.smu.edu/IDA/EMMSA/, implemented in JAVA and supported on MS Windows.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymmetric Effects of Monetary Policy and Business Cycles in Iran using Markov-switching Models

This paper investigates the asymmetric effects of monetary policy on economic growth over business cycles in Iran. Estimating the models using the Hamilton (1989) Markov-switching model and by employing the data for 1960-2012, the results well identify two regimes characterized as expansion and recession. Moreover, the results show that an expansionary monetary policy has a positive and statist...

متن کامل

Alignment-free Sequence Analysis Using Extensible Markov Models

Profile models based on Hidden Markov Models (HMM) for sequence studies have gained visibility among researchers. While the mathematical foundation, the proven algorithms such as Viterbi, Forward and Backward algorithms have certainly provided a rigorous probabilistic platform, the requirement of classic alignment has ensured an extremely high time complexity. We propose the use of another kind...

متن کامل

A Sensor-Based Scheme for Activity Recognition in Smart Homes using Dempster-Shafer Theory of Evidence

This paper proposes a scheme for activity recognition in sensor based smart homes using Dempster-Shafer theory of evidence. In this work, opinion owners and their belief masses are constructed from sensors and employed in a single-layered inference architecture. The belief masses are calculated using beta probability distribution function. The frames of opinion owners are derived automatically ...

متن کامل

Monetary Fundamental-Based Exchange Rate Model in Iran: Applying a MS-TVTP Approach

T he main purpose of this article is to analyze exchange rate behavior based on monetary fundamentals in the context of Iranian economy over the period 1990:2 to 2014:3. To do so, two monetary exchange rate models is investigated, the first by regarding interest rate differential as a monetary variable, and the second one regardless of interest rate differential as a monetary variabl...

متن کامل

Development of Markov Chain Grey Regression Model to Forecast the Annual Natural Gas Consumption

Accurate forecasting of annual gas consumption of the country plays an important role in energy supply strategies and policy making in this area.  Markov chain grey regression model is considered to be a superior model for analyzing and forecasting annual gas consumption.  This model Markov is a combination of the Markov chain and grey regression models. According to this model, the residual er...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 26 18  شماره 

صفحات  -

تاریخ انتشار 2010